52 research outputs found
Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Power Analysis and Sample Size Estimation
This study's first purpose is to provide quantitative evidence that would
incentivize researchers to instead use the more robust method of nested
cross-validation. The second purpose is to present methods and MATLAB codes for
doing power analysis for ML-based analysis during the design of a study. Monte
Carlo simulations were used to quantify the interactions between the employed
cross-validation method, the discriminative power of features, the
dimensionality of the feature space, and the dimensionality of the model. Four
different cross-validations (single holdout, 10-fold, train-validation-test,
and nested 10-fold) were compared based on the statistical power and
statistical confidence of the ML models. Distributions of the null and
alternative hypotheses were used to determine the minimum required sample size
for obtaining a statistically significant outcome ({\alpha}=0.05,
1-\b{eta}=0.8). Statistical confidence of the model was defined as the
probability of correct features being selected and hence being included in the
final model. Our analysis showed that the model generated based on the single
holdout method had very low statistical power and statistical confidence and
that it significantly overestimated the accuracy. Conversely, the nested
10-fold cross-validation resulted in the highest statistical confidence and the
highest statistical power, while providing an unbiased estimate of the
accuracy. The required sample size with a single holdout could be 50% higher
than what would be needed if nested cross-validation were used. Confidence in
the model based on nested cross-validation was as much as four times higher
than the confidence in the single holdout-based model. A computational model,
MATLAB codes, and lookup tables are provided to assist researchers with
estimating the sample size during the design of their future studies.Comment: Under review at JSLH
Statistical properties of linear prediction analysis underlying the challenge of formant bandwidth estimation
Formant bandwidth estimation is often observed to be more challenging than the estimation of formant center frequencies due to the presence of multiple glottal pulses within a period and short closed-phase durations. This study explores inherently different statistical properties between linear prediction (LP)–based estimates of formant frequencies and their corresponding bandwidths that may be explained in part by the statistical bounds on the variances of estimated LP coefficients. A theoretical analysis of the Cramér-Rao bounds on LP estimator variance indicates that the accuracy of bandwidth estimation is approximately twice as low as that of center frequency estimation. Monte Carlo simulations of all-pole vowels with stochastic and mixed-source excitation demonstrate that the distributions of estimated LP coefficients exhibit expectedly different variances for each coefficient. Transforming the LP coefficients to formant parameters results in variances of bandwidth estimates being typically larger than the variances of respective center frequency estimates, depending on vowel type and fundamental frequency. These results provide additional evidence underlying the challenge of formant bandwidth estimation due to inherent statistical properties of LP-based speech analysi
Direct measurement and modeling of intraglottal, subglottal, and vocal fold collision pressures during phonation in an individual with a hemilaryngectomy
The purpose of this paper is to report on the first in vivo application of a recently developed transoral, dual-sensor pressure probe that directly measures intraglottal, subglottal, and vocal fold collision pressures during phonation. Synchronous measurement of intraglottal and subglottal pressures was accomplished using two miniature pressure sensors mounted on the end of the probe and inserted transorally in a 78-year-old male who had previously undergone surgical removal of his right vocal fold for treatment of laryngeal cancer. The endoscopist used one hand to position the custom probe against the surgically medialized scar band that replaced the right vocal fold and used the other hand to position a transoral endoscope to record laryngeal high-speed videoendoscopy of the vibrating left vocal fold contacting the pressure probe. Visualization of the larynx during sustained phonation allowed the endoscopist to place the dual-sensor pressure probe such that the proximal sensor was positioned intraglottally and the distal sensor subglottally. The proximal pressure sensor was verified to be in the strike zone of vocal fold collision during phonation when the intraglottal pressure signal exhibited three characteristics: an impulsive peak at the start of the closed phase, a rounded peak during the open phase, and a minimum value around zero immediately preceding the impulsive peak of the subsequent phonatory cycle. Numerical voice production modeling was applied to validate model-based predictions of vocal fold collision pressure using kinematic vocal fold measures. The results successfully demonstrated feasibility of in vivo measurement of vocal fold collision pressure in an individual with a hemilaryngectomy, motivating ongoing data collection that is designed to aid in the development of vocal dose measures that incorporate vocal fold impact collision and stresses.Fil: Mehta, Daryush D.. Massachusetts General Hospital; Estados UnidosFil: Kobler, James B.. Massachusetts General Hospital; Estados UnidosFil: Zeitels, Steven M.. Harvard Medical School. Department of Medicine. Massachusetts General Hospital; Estados UnidosFil: Zañartu, MatÃas. Universidad Técnica Federico Santa MarÃa; ChileFil: Ibarra, Emiro J.. Universidad Técnica Federico Santa MarÃa; ChileFil: Alzamendi, Gabriel Alejandro. Universidad Nacional de Entre RÃos. Instituto de Investigación y Desarrollo en BioingenierÃa y Bioinformática - Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación y Desarrollo en BioingenierÃa y Bioinformática; ArgentinaFil: Manriquez, Rodrigo. Universidad Técnica Federico Santa MarÃa; ChileFil: Erath, Byron D.. Clarkson University; Estados UnidosFil: Peterson, Sean D.. University of Waterloo; CanadáFil: Petrillo, Robert H.. Center For Laryngeal Surgery and Voice Rehabilitation; Estados UnidosFil: Hillman, Robert E.. Center For Laryngeal Surgery and Voice Rehabilitation; Estados Unidos. Harvard Medical School. Department of Medicine. Massachusetts General Hospital; Estados Unido
Estimation of Subglottal Pressure, Vocal Fold Collision Pressure, and Intrinsic Laryngeal Muscle Activation From Neck-Surface Vibration Using a Neural Network Framework and a Voice Production Model
The ambulatory assessment of vocal function can be significantly enhanced by having access to physiologically based features that describe underlying pathophysiological mechanisms in individuals with voice disorders. This type of enhancement can improve methods for the prevention, diagnosis, and treatment of behaviorally based voice disorders. Unfortunately, the direct measurement of important vocal features such as subglottal pressure, vocal fold collision pressure, and laryngeal muscle activation is impractical in laboratory and ambulatory settings. In this study, we introduce a method to estimate these features during phonation from a neck-surface vibration signal through a framework that integrates a physiologically relevant model of voice production and machine learning tools. The signal from a neck-surface accelerometer is first processed using subglottal impedance-based inverse filtering to yield an estimate of the unsteady glottal airflow. Seven aerodynamic and acoustic features are extracted from the neck surface accelerometer and an optional microphone signal. A neural network architecture is selected to provide a mapping between the seven input features and subglottal pressure, vocal fold collision pressure, and cricothyroid and thyroarytenoid muscle activation. This non-linear mapping is trained solely with 13,000 Monte Carlo simulations of a voice production model that utilizes a symmetric triangular body-cover model of the vocal folds. The performance of the method was compared against laboratory data from synchronous recordings of oral airflow, intraoral pressure, microphone, and neck-surface vibration in 79 vocally healthy female participants uttering consecutive /pæ/ syllable strings at comfortable, loud, and soft levels. The mean absolute error and root-mean-square error for estimating the mean subglottal pressure were 191 Pa (1.95 cm H2O) and 243 Pa (2.48 cm H2O), respectively, which are comparable with previous studies but with the key advantage of not requiring subject-specific training and yielding more output measures. The validation of vocal fold collision pressure and laryngeal muscle activation was performed with synthetic values as reference. These initial results provide valuable insight for further vocal fold model refinement and constitute a proof of concept that the proposed machine learning method is a feasible option for providing physiologically relevant measures for laboratory and ambulatory assessment of vocal function.Fil: Ibarra, Emiro J.. Universidad Tecnica Federico Santa Maria.; ChileFil: Parra, Jesús A.. Universidad Tecnica Federico Santa Maria.; ChileFil: Alzamendi, Gabriel Alejandro. Universidad Nacional de Entre RÃos. Instituto de Investigación y Desarrollo en BioingenierÃa y Bioinformática - Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación y Desarrollo en BioingenierÃa y Bioinformática; ArgentinaFil: Cortés, Juan P.. Universidad Tecnica Federico Santa Maria.; ChileFil: Espinoza, VÃctor M.. Universidad de Chile; ChileFil: Mehta, Daryush D.. Center For Laryngeal Surgery And Voice Rehabilitation; Estados UnidosFil: Hillman, Robert E.. Center For Laryngeal Surgery And Voice Rehabilitation; Estados UnidosFil: Zañartu, MatÃas. Universidad Tecnica Federico Santa Maria.; Chil
Using Ambulatory Voice Monitoring to Investigate Common Voice Disorders: Research Update
Many common voice disorders are chronic or recurring conditions that are likely to result from inefficient and/or abusive patterns of vocal behavior, referred to as vocal hyperfunction. The clinical management of hyperfunctional voice disorders would be greatly enhanced by the ability to monitor and quantify detrimental vocal behaviors during an individual’s activities of daily life. This paper provides an update on ongoing work that uses a miniature accelerometer on the neck surface below the larynx to collect a large set of ambulatory data on patients with hyperfunctional voice disorders (before and after treatment) and matched-control subjects. Three types of analysis approaches are being employed in an effort to identify the best set of measures for differentiating among hyperfunctional and normal patterns of vocal behavior: (1) ambulatory measures of voice use that include vocal dose and voice quality correlates, (2) aerodynamic measures based on glottal airflow estimates extracted from the accelerometer signal using subject-specific vocal system models, and (3) classification based on machine learning and pattern recognition approaches that have been used successfully in analyzing long-term recordings of other physiological signals. Preliminary results demonstrate the potential for ambulatory voice monitoring to improve the diagnosis and treatment of common hyperfunctional voice disorders
Method for Horizontal Calibration of Laser-Projection Transnasal Fiberoptic High-Speed Videoendoscopy
Objective: Calibrated horizontal measurements (e.g., mm) from endoscopic procedures could be utilized for advancement of evidence-based practice and personalized medicine. However, the size of an object in endoscopic images is not readily calibrated and depends on multiple factors, including the distance between the endoscope and the target surface. Additionally, acquired images may have significant non-linear distortion that would further complicate calibrated measurements. This study used a recently developed in vivo laser-projection fiberoptic laryngoscope and proposes a method for calibrated spatial measurements. Method: A set of circular grids was recorded at multiple working distances. A statistical model was trained that would map from pixel length of the object, the working distance, and the spatial location of the target object into its mm length. Result: A detailed analysis of the performance of the proposed method is presented. The analyses have shown that the accuracy of the proposed method does not depend on the working distance and length of the target object. The estimated average magnitude of error was 0.27 mm, which is three times lower than the existing alternative. Conclusion: The presented method can achieve sub-millimeter accuracy in horizontal measurement. Significance: Evidence-based practice and personalized medicine could significantly benefit from the proposed method. Implications of the findings for other endoscopic procedures are also discussed
- …